What is the percentage of people who have a Bachelor's degree

Education

Question: What is the percentage of people who have a Bachelor's Degree

  1. So we get the values of education column
  2. Then we remove all other values other than Bachelor
  3. Then we count how many bachelors there are

Or is there another more efficient method?

# Assume the CSV file is already converted into a dataframe

# Keep only the values equal to "bachelor"
column_values = df['education']

# Boolean mask
# It compares each element of `column_values` with the value `"Bachelors"`
boolean_mask = column_values == "Bachelors"

# Using the boolean mask for indexing
# When you use a boolean mask to index a Series or DataFrame, only the elements corresponding to `True` values in the mask are selected.
filtered_series = column_values[boolean_mask]

# Finally count the duplicates
filtered_series.value_counts()

Aight, this was wrong. We were meant to get the percentage,
So, is there a straightforward way of pandas that can calculate the percentage of something?

Okay there's no straightforward way
Percentage = (Part / Whole) * 100
Wait, what does value_counts() do?
Okay, it counts the unique values

So, the best way for this is to find a pandas function that can count every elements in a column, that'd be the whole.

The filtered_series.value_counts() then would be the part

# Keep only the values equal to "bachelor"
column_values = df['education']

whole = column_values.count()

# Boolean mask
# It compares each element of `column_values` with the value `"Bachelors"`
boolean_mask = column_values == "Bachelors"

# Using the boolean mask for indexing
# When you use a boolean mask to index a Series or DataFrame, only the elements corresponding to `True` values in the mask are selected.
filtered_series = column_values[boolean_mask]

# Finally count the duplicates
part = filtered_series.value_counts()

percentage = ( part / whole ) * 100